A Quick Verification of the 2-D Galaxy Distribution with SDSS Data 

Alexander Unzicker and Julius Fischer 

Pestalozzi-Gymnasium Miinchen, Germany 
alexander.unzicker@lrz.uni-muenchen.de 

June 9, 2008 



OC 
O 
O 
(N 

S3 

6 

o3 



> 
O 

in 
o 

00 

o 



X 

S-H 



Abstract 

We present source code for the computer algebra system Mathematica that analyzes the distribution of nearby Galaxies 
using SDSS data. Download instructions are given, thus within 10 minutes, the reader can verify that galaxies are 
distributed in an essentially non-homogeneous manner and cluster on 2-dimensional structures. The short code uses a 
simple method inspired by Minkowski functionals: the distances to the next neighbors are calculated and compared to 
random distributions in three and two dimensions. The observed distance distribution corresponds clearly to the latter 
case. The paper may also be helpful for nonexpert scientists to get started with SDSS data analysis. 



1 Introduction 

Still 50 years after Hubble's discovery of the expanding 
universe the distribution of galaxies was assumed to ho- 
mogeneous - a seemingly obvious consequence of the cos- 
mological principle. Pioneering investigations [1, 2] how- 
ever showed that distribution of galaxies is all but homo- 
geneous. Rather there seems to be a hierarchy of galaxy 
groups, clusters and superclusters that concentrate on 
twodimensional structures, while there are large voids in 
between ('sponge structure', [3]). There is an ongoing dis- 
cussion whether the universe becomes homogeneous for 
scales larger than 100 Mpc, or if it has the properties of a 
fractal with D = 2 even on larger scales [4] . 

Here we do not present any new results that help to de- 
cide that question and our approach cannot compete with 
the detailedness of the expert's analysis ([5, 4, 6] and ref- 
erences herein). From a point of view of general scientific 
methodology, we find it however desirable that important 
results of fundamental physics that require extensive nu- 
merical treatment can be repeated by a broad public of 
non-expert scientists 1 . In particular, the unique quality of 
the free accessible SDSS data supports such an approach 
we would like to ease further. Two-point statistics are fre- 
quently used to extract information on the dimensionality. 

1 See, e.g. [7] for a similar approach. 



Here we use just next neighbor statistics. Imagine spheres 
with growing radius r around each galaxy. When r reaches 
a critical radius r c , the spheres will overlap to a connected 
manifold of the size of the whole sample (see fig. 1). 
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Figure 1: Example of Minkowski functionals: at a criti- 
cal radius, the manifold becomes connected. Picture taken 
from [8]. 

Obviously, this transition must occur much earlier (at 
a small r c ) when the distribution is not homogeneous in 
three dimensions. We do not fully implement this method 
of Minkowski functionals [9] , but next neighbor distances 
obviously do yield significant information. 

The code of about 100 lines given below reproduces the 
results given in section 3. It can easily be run with different 
data sets and future data releases of SDSS. We plan to add 
some refinements for a second version, but also the reader 
should be able to do slight modifications or extensions of 
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the code. A quick description for getting started is found in 
section 5.1. Though we cannot give a detailed description 
of the program, some clarifying comments are included in 
the quite self-explaining code (see 5.2). 

2 Methods 

2.1 General method and limitations 

As a first approximative approach, we did not take into 
consideration galaxy size and morphology, which may well 
influence a more refined analysis. Spectra were just used to 
determine the distance by the redshift, Ho was assumed as 
72 kms^Mpc^ 1 [10, 11]. Though peculiar velocities cause 
errors in the radial distance, no correction was tried so far 
to take into account that effect. To avoid faint galaxies to 
drop out of the sample, we considered redshifts z < 0.03, 
the point to which the SDSS data show a roughly constant 
density. There is a clearly visible decay of the number of 
galaxies per volume 2 for D > 130 Mpc or z > 0.03 (see 
fig. 2) 3 . 
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Figure 2: Galaxy number between D and D + dD as a 
function of distance D. Constant density should lead to a 
parabolic increase of the number of galaxies with distance. 
For D > 130 Mpc or z > 0.03, obviously a considerable 
percentage of galaxies are too faint to be detected. 



2.2 Data acquisition 

The sixth data release (DR6) of the SDSS data is located 
in http : / /www. sdss.org/dr6 (see fig. 3). Though very 



2 We do not address here the question if such a density can rea- 
sonably defined for a fractal. 

3 This picture cannot be generated by the code given below. 




Figure 3: Coverage of spectral data from SDSS DR6 site 

simple data sets can be accessed by search masks, we 
strongly recommend the use of SQL data search language 
to which the SDSS site provides very good tutorials. The 
part of the sky taken into consideration was determined 
by the SDSS coverage. We chose 140° < DEC2A0 and 
30° < RA60° and a second sample 140° < DEC240 and 
-2° < iL411°. See fig. 4 for a 3D-plot of the galaxies of 
sample 1. The confidence level was set to 0.35. The respec- 
tive SQL commands for downloading the data are listed in 
the appendix 5.1. 
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Figure 4: 3D-plot of the position of the 9280 galaxies of 
sample 1. 
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2.3 Data manipulation and modelling 

We used the computer algebra system Mathematica to 
convert the raw data to Euclidean coordinates. Thus the 
distances to the next neighbor could be calculated eas- 
ily. Then, these minimal or next neighbor distances for 
each galaxy contain the desired structure information. For 
a large number of i points, an effective minimal distance 
algorithm is needed, since computing all distances would 
lead to an increase of computational time t ~ i 2 . We chose 
a very simple method that leads to t <~ i. The entire vol- 
ume was divided by a rectangular lattice in n 3 boxes of 
equal size 4 , e.g. for n — 10 into 1000 boxes. In a first step, 
each galaxy was assigned to its box. To determine the min- 
imal distance of an individual galaxy, the computation of 
all distances within one box and the 26 neighboring boxes 
was sufficient 5 . Likewise, the random distributions were 
analyzed, whereby in the 2D-case a n 2 -lattice with larger 
n was chosen. 6 The 3D- simulation consisted of a cube 
with the same volume as the real sample shown in fig. (4). 
The galaxy density was equal to the real one. For the 2D- 
simulation, the size of the surface of a sphere was chosen, 
while the spherical volume was equal to the real one. Due 
to computational simplicity, the form of the surface was a 
square. 

3 Results - a preliminary analysis 
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Figure 5: Sorted next-ncighbor-distances, for the real dis- 
tribution (black) and the simulations in 3-D (dark gray) 
and 2D (light gray) . Sample from the region 60° > DEC > 
30° and 240° > RA > 140°. 



4 for the real distribution, an equal angular size was used. 

5 Of course, in 2D there are 8 neighboring boxes. 

6 ra does however influence the computational time only. 
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Figure 6: as fig. 5, but for a smaller sample 11° > DEC > 
-2° and 240° > RA > 140°. 

The real sorted distances (fig. 5 and 6) exclude clearly a 
homogeneous 3D-distribution which would lead to greatly 
different next-neighbor-distances. They also coincide well 
with the 2D random distribution on a surface, as far as 
small distances are concerned. The approach to the 3D- 
simulation for larger distances could indicate a homogene- 
ity of the universe on larger scales. We do not know how 
peculiar velocities influence, the effect should however be 
limited to D < 5 Mpc ([4], p. 6) Next neighbors which 
occasionally are to faint to be detected could as well spoil 
the analysis at larger distances. Especially the presence 
of large voids in the real distribution may explain why 
there the largest next-ncighbor-distances exceed those of 
the random distribution. Therefore, as mentioned in the 
introduction, an interpretation of the present data in favor 
of a large-scale homogcnity instead of a fractal dimension 
D = 2 would be premature. We have no explanation so 
far for the difference visible in the two samples fig. (5) and 
(6). 

4 Conclusions 

The distribution of galaxies in the universe is still a riddle 
and theoretically not fully understood [5]. The discovery 
of the two-dimensional structure of the galaxy distribution 
reminds us from something deep and mysterious, such as 
from Dirac's observation of a twodimensional density in 
the universe. 7 Particulary ACDM simulations have prob- 
lems to account for the observed structure. Observational 

7 From Dirac's large number hypothesis follows that the surface of 
all protons is of the same order as the surface of the horizon. Today 
this is usually considered as pure coincidence. 
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progress in this field is therefore very much triggered by 
new high-quality data like SDSS. The ongoing discussion 
can thus benefit from a broad accessibility of those data 
and a transparent processing which is not limited to a few 
groups. If one day the data allow a definite answer to decide 
whether the large scale structure is homogeneous of even 
fractal, this must become evident also for the non-expert 
scientist. We hope this is a little step towards repeatability 
and transparency for the efforts to answer that important 
question. 

Acknowledgement. Though we are grateful for any 
comments, please understand that we cannot guarantee 
functionality or give further support for getting this pro- 
gram to run on your computer. 
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5 Appendix: data preparation and source code 



5.1 Step-by step procedure in 10 minutes 

1. Create your directory 'sdss' and copy all the following files in there. 

2. Browse to http : / /www. alexander — unzicker.de/sdssl.txt and copy the file. Alternatively copy and paste from 
the arXiv source and save as sdssl.txt. 

3. Browse to http : / /cas. sdss.org/dr6/en/tools/ search/ sql.asp 

4. Type the following SQL commands in the blank field (you may paste and copy it also from the end of the sdssl.txt 
file): 

select ra,dec,z 
from specObj 

where ra BETWEEN 140 and 240 AND 

dec BETWEEN 30 and 60 AND 

specClass = 2 AND 

z BETWEEN 0.001 AND 0.03 

AND zConf > 0.35 

5. Chose file format CSV. 

6. Press submit and save the data file as sdssOS.csv. 

7. Proceed likewise for dec between -2 and 11 and save as sdssOSa.csv. 

8. Open a Mathematica *.nb file and run the following commands (apart from the SetDirectory comand where you 
have to put in your path, you may paste and copy it also from the end of the sdssl.txt file) 

SetDirectory ["yoursdsspath"] ; 
«"sdssl.txt"; 

readData ["sdss03 . csv"] ; (* later sdss03a.csv *) 
GalPlot[{100, 500, 700}]; 

f astDistances [rpt , xyz, {20, 20, 20}, {rarg, decrg, rrg}] ; 
randomDistances3d[vol~ (1/3) , Galnumber, {20, 20, 20}]; 
randomDistances2d[(3 vol/4/Pi) ~ (1/3) Sqrt [4 Pi], Galnumber, {90, 90}]; 
compareDistances [0.004] ; (** creates one plot from 3 ***) 

5.2 Source code 

«Statistics 'DataManipulation' ; 
(************** plot options *************) 
sty2={{GrayLevel [0.3], Thickness [0.01]}, {GrayLevel [0.5], 
Thickness [0.01]}, {GrayLevel [0 . 8] .Thickness [0.01]}}; 
(***##********#******* constants #****#****#****#********) 

cc=299792458; H0=l/(4.4081 10~(17)); Mpc=3 . 08567758128*10" (22) ; (^corresponds to 70 km/s/Mpc*) 
offset = 0.00000001; (**** to avoid division by zero ****) 
(************* functions needed ************************) 

tor [z_] : = (2cc z+cc z~2)/(H0 Mpc (2+2z+z~2) ) ; (* transformation from redshift to Mpc distance *) 
toxyz [{r_ ,th_ ,ph_}] : ={r Sin [th] Cos [ph] ,r Sin [th] Sin [ph] ,r Cos[th]}; (*to cartesian coord.*) 
dist2 [kl_ ,k2_] : =Apply [Plus , (kl-k2) ~2] ; (*square of distance of two points kl, k2 in 3 dim.*) 
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(*************** procedures**********************************) 

(********* reads SDSS data in *.csv format *****************) 

readData[inf ile_] :=Block[{qwe, wer, werl, ra, dec, rr, zz}, (* local variables *) 

If [Filelnf ormation[infile]=={}, Print ["file not found."]; Goto [endlabel] ] ; (* check infile *) 

qwe = Drop [Import [infile , "CSV"], 1]; werl = Transpose [qwe] ; 

wer = ReplacePart [werl , offset, Position [werl , 0]];(* replace undesired zeros *) 
ra = 2 Pi wer [ [1] ] /360 ; dec=2 Pi wer [ [2] ] /360; zz=wer[[3]]; (* angle in radians *) 
(*** automatic boundary determination from data , ranges of coordinates ****) 
{ramin, decmin}={Min [ra] -of f set ,Min [dec] -offset} ; 
{ramax, decmax}={Max [ra] +of f set ,Max [dec] +of f set} ; 

zmin=Min [zz] -of f set ; zmax=Max [zz] +of f set ; rarg=ramax-ramin; decrg=decmax-decmin; 
zrg=zmax-zmin; meandec= (decmax+decmin) /2 ; 

rrmax = tor[zmax]; rrmin = tor[zmin]; rrg = rrmax - rrmin; 
rr = Map[tor, zz] ; (*** transformation redshift - radius**) 
rpt=Transpose [{rr , dec, ra}] ; Ospherical coordinates*) 
xyz= Map[toxyz, rpt] ; (* transform to cartesian coordinates *) 

vol=(rrmax~3-rrmin~3) 4/3 Pi rarg/(2Pi) decrg/(2Pi) Abs [Cos [meandec] ] ; (* mind latitude *) 
Galnumber=Length [rpt] ; GalDichte=Galnumber/vol ; 

Print ["Number of galaxies: ", Galnumber] ; Print ["Volume in Mpc~3: ", vol]; 
Label [endlabel] ] ; 

(j************** calculates real distances *************************) 
realDistances [rpt_,xyz_,unt_,rgs_] :=Block[{}, tul=TimeUsed[] ; 
{rarg, decrg, rrg}=rgs; (* ranges*) 

box = mindist = Table [{}, {unt[[l]]}, {unt[[2]]}, {unt [ [3] ] }] ; 
(* empty variable for the boxes and minimal distances*) 
(* assign each galaxy to its box *) 

(* though boxes are defined by polar coordinates, they contain cartesian ones *) 

For[i = 1, i <= Galnumber, i++, AppendTo [box [ [Ceiling [(rpt [ [i , 1]] - rrmin)/rrg *unt[[l]]], 

Ceiling [ (rpt [[i, 2] ] -decmin)/decrg*unt [ [2] ] ] , 

Ceiling [ (rpt [[i, 3]] -ramin) /rarg *unt[[3]]]]] ,xyz[[i]]]] ; 
tu2=TimeUsed[] ;Print ["assign to boxes. . . ",tu2-tul, "s"] ; 

(**** distance in box i,j,k to all other galaxies in the box and in neighboring boxes ******) 
For[i =1, i <= unt[[l]], i++, 
For[j = 1, j <= unt [[2]], j++, 
For [k = 1 , k <= unt [ [3] ] , k++ , 

For[m = 1, m <= Length [box [ [i , j, k]]], m++, distances={} ; 
(*now go through neighboring boxes ii.jj, kk *) 
For[ii = i-1, ii <=i+l, ii++, 
For[jj = j-1, jj <= j+1, 

For[kk = k-1, kk <= k+1, kk++, 

lf[(ii==0 II jj==0 II kk==0 II ii==unt[[l]]+l II j j==unt [ [2] ] +1 II kk==unt [ [3] ] +1) , 
Continue , 

If [box [ [ii , j j ,kk] ] ! ={}, AppendTo [distances , Table [dist2 [box [ [i , j ,k,m] ] ,box [ [ii , j j ,kk,mm] ] ] , 
{mm, Length [box [ [ii , jj, kk] ]]}]]]] ; ]]]; (* avoid pathologic cases like empty boxes *) 
If [(distances !={} kk Flatten [distances] ! ={0 .}) , 

AppendTo [mindist [[i, j ,k]] ,Sort [Flatten [distances] ] [[2]]]] ; 

]]]]; 

tul=TimeUsed[] ; Print ["measuring. . .",tul-tu2, "s"] ; 

distToNext = Flatten [mindist] ; rowOf Dist=Sort [distToNext] ; 

ListPlot [rowOfDist, Plot Joined ->True, AxesLabel -> {"Number", "Mpc"}]]; 
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(******* calculates distances of random distibutions in 3D *****************) 
randomDistances3d [edge_ , nn_ , unt_] : = 

Block [{box, mindist, distances, distToNext, i, j ,k,m}, tul=TimeUsed [] ; 

xyz=Table [edge{Random [] , Random [] , Random []},{i,nn}] ; 

box = mindist = Table [{}, {unt[[l]]>, -Cunt [ [2] ] } , {unt [ [3] ] }] ; 

(* empty variable for the boxes and minimal distances*) 

(* assign each galaxy to its box *) 

For[i = 1, i <= nn, i++, AppendTo [box [ [Ceiling [(xyz [ [i , 1] ] )/edge*unt [ [1] ] ] , 

Ceiling [(xyz[[i, 2] ] ) /edge*unt [ [2] ] ] , Ceiling [(xyz [ [i , 3] ] ) /edge*unt [ [3] ] ] ] ] , xyz [ [i] ] ] ] ; 
tu2=TimeUsed[] ;Print ["assign to boxes. . . ",tu2-tul, "s"] ; 
For[i =1, i <= unt[[l]], i++, 
For[j = 1, j <= unt [[2]], 

For [k = 1 , k <= unt [ [3] ] , k++ , 

For[m = 1, m <= Length [box [ [i , j, k]]], m++, distances={} ; 
(*now go through neighboring boxes ii,jj, kk *) 
For[ii = i-1, ii <=i+l, ii++, 
For[jj = j-1, jj <= 

For[kk = k-1, kk <= k+1, kk++, 

lf[(ii==0 II jj==0 II kk==0 II ii==unt[[l]]+l II j j==unt [ [2] ] +1 II kk==unt [ [3] ] +1) , 
Continue , 

If [box[[ii,jj ,kk]] !={}, AppendTo [distances, Table [dist2 [box [ [i , j ,k,m]] , 
box[[ii, j j ,kk,mm]]] , {mm, Length [box [ [ii , jj, kk]]]}]]]]; 

]]]; 

If [(distances !={}&& Flatten [distances] !={0.», 

AppendTo [mindist [ [i ,j ,k]] , Sort [Flatten [distances] ] [[2]]]] ; 

]]]]; 

tul=TimeUsed[] ; Print ["measuring. . .",tul-tu2, "s"] ; 
distToNext = Flatten [mindist] ; 
rowOf Dist3=Sort [distToNext] ; 

ListPlot[row0fDist3, Plot Joined ->True, AxesLabel -> {"Number", "Mpc"}]]; 
(******* calculates distances of random distibutions in 2D *****************) 
randomDistances2d [edge_ , nn_ , unt_] : = 

Block [{box, mindist, distances, distToNext, i,j,m>, tul=TimeUsed [] ; 

xy=Table [edge{Random[] , Random []},{i,nn}] ; 

box = mindist = Table [{}, {unt[[l]]}, {unt [[2]]}]; 

(* empty variable for the boxes and minimal distances*) 

(* assign each galaxy to its box *) 

For[i = 1, i <= nn, i++, AppendTo [box [ [Ceiling [(xy[[i, 1] ] ) /edge*unt [ [1] ] ] , 

Ceiling [(xy[[i, 2] ] )/edge*unt [ [2] ] ] ] ] , xy[[i]]]]; 
tu2=TimeUsed[] ;Print ["assign to boxes. . . ",tu2-tul, "s"] ; 
For[i = 1, i <= unt[[l]], i++, 
For[j = 1, j <= unt [[2]] , j++, 

For[m = 1, m <= Length [box [ [i , j]]], m++, distances={} ; 
(*now go through neighboring boxes ii.jj *) 
For[ii = i-1, ii <=i+l, ii++, 
For[jj = j-1, jj <=j+l, jj++, 

lf[(ii==0 II jj==0 II ii==unt[[l]]+l II j j==unt [ [2] ] +1) , Continue, 
If [box[ [ii, j j] ] !={}, AppendTo [distances , Table [dist2 [box [[i, j ,m] ] , 
box[[ii,jj ,mm]]] , {mm, Length [box [[ii, jj] ]]>]]]] ; 

]]; 
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If [(distances !={}&& Flatten [distances] !={0.}), 

AppendTo [mindist [ [i , j] ] , Sort [Flatten [distances] ] [ [2] ] ] ] ; 

]]]; 

tul=TimeUsed[] ; Print ["measuring. . . ",tul-tu2, "s"] ; 
distToNext = Flatten [mindist] ; 
rowOf Dist2=Sort [distToNext] ; 

ListPlot [row0fDist2, Plot Joined ->True, AxesLabel -> {"Number", "Mpc"}]]; 

(***************** plot of galaxy distribution **************************) 

GalPlot [vp_(*chose appropriate ViewPoint*)] : =Show [Graphics3D [Map [Point, xyz] ] , 

Boxed->False, ViewPoint -> vp, Prolog->AbsolutePointSize [0 . 0005] ] ; 

compareDistances [th_] : =Block [{} , $Def aultFont={"Arial" , 8}; 

lpl = ListPlot [rowOfDist, PlotStyle -> {Thickness [th] .GrayLevel [0] } , 

Plot Joined->True , DisplayFunction -> Identity]; 

lp2 = ListPlot [row0fDist2, PlotStyle ->{Thickness [th] .GrayLevel [0 . 8] } , 
Plot Joined->True , DisplayFunction -> Identity] ; 

lp3 = ListPlot [row0fDist3, PlotStyle -> {Thickness [th] .GrayLevel [0 . 5] } , 
Plot Joined->True , DisplayFunction -> Identity]; 
distrib=Show[lpl,lp2,lp3, DisplayFunction -> $DisplayFunction, 
AspectRatio -> .6, AxesLabel -> {"Number", "Mpc"}]]; 
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